Online Clustering Via Inequality

نویسنده

  • Shriprakash Sinha
چکیده

Given an example-feature set, representing the information context present in a dataset, is it possible to reconstruct the information context in the form of clusters to a certain degree of compromise, if the examples are processed randomly without repetition in a sequential online manner? A general transductive inductive learning strategy which uses constraint based multivariate Chebyshev inequality is proposed. Theoretical convergence in the reconstruction error to a finite value with increasing number of (a) processed examples and (b) generated clusters, respectively, is shown. Upper bounds for these error rates are also proved. Nonparametric estimates of these error from a sample of random sequences of example set, empirically point to a stable number of clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model-Independent Measure of Regression Difficulty

data mining, machine learning, model fitting, regression, exploratory data analysis, error rate estimation, data modeling, data cleaning, data preparation, predictability We prove an inequality bound for the variance of the error of a regression function plus its non-smoothness as quantified by the Uniform Lipschitz condition. The coefficients in the inequality are calculated based on training ...

متن کامل

BotOnus: an online unsupervised method for Botnet detection

Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...

متن کامل

Trees for Topic Detection

Extracting topic keywords from on-line text documents is highly significant in text mining applications. In our work, extracted keywords are represented as a hierarchical topic tree. For this, we basically use incremental clustering technique for incoming online documents. Moreover, we define a cluster-based measure similar to the tfidf measure and a probabilistic inequality to determine subsum...

متن کامل

Identification of T-S Fuzzy Classifier Via Linear Matrix Inequalities

In this paper a new linear matrix inequality (LMI) based design method for T-S fuzzy classifier is proposed. The various design factors including structure of fuzzy rule and various parameters should be considered to design T-S fuzzy classifier. To determine these design factors, we describe a new and efficient two-step approach that leads to good results for classification problem. At first, L...

متن کامل

Fast online graph clustering via Erdös-Rényi mixture

In the context of graph clustering, we consider the problem of estimating simultaneously both the partition of the graph nodes and the parameters of an underlying mixture of affiliation networks. In numerous applications the rapid increase of data size with time makes classical clustering algorithms too slow because of the high computational cost. In such situations online clustering algorithms...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013